Methods for motion estimation with adaptive motion accuracy

ABSTRACT

Methods for motion estimation with adaptive motion accuracy of the present invention include several techniques for computing motion vectors of high pixel accuracy with a minor increase in computation. One technique uses fast-search strategies in sub-pixel space that smartly searches for the best motion vectors. An alternate technique estimates high-accurate motion vectors using different interpolation filters at different stages in order to reduce computational complexity. Yet another technique uses rate-distortion criteria that adapts according to the different motion accuracies to determine both the best motion vectors and the best motion accuracies. Still another technique uses a VLC table that is interpreted differently at different coding units, according to the associated motion vector accuracy.

This application claims the benefit of Provisional Application No.60/146,102, filed Jul., 27, 1999.

BACKGROUND OF THE INVENTION

The present invention relates generally to a method of compressing orcoding digital video with bits and, specifically, to an effective methodfor estimating and encoding motion vectors in motion-compensated videocoding.

In classical motion estimation the current frame to be encoded isdecomposed into image blocks of the same size, typically blocks of 16×16pixels, called “macroblocks.” For each current macroblock, the encodersearches for the block in a previously encoded frame (the “referenceframe”) that best matches the current macroblock. The coordinate shiftbetween a current macroblock and its best match in the reference frameis represented by a two-dimensional vector (the “motion vector”) of themacroblock. Each component of the motion vector is measured in pixelunits.

For example, if the best match for a current macroblock happens to be atthe same location, as is the typical case in stationary background, themotion vector for the current macroblock is (0,0). If the best match isfound two pixels to the right and three pixels up from the coordinatesof the current macroblock, the motion vector is (2,3). Such motionvectors are said to have integer pixel (or “integer-pel” or “full-pel”)accuracy, since their horizontal X and vertical Y components are integerpixel values. In FIG. 1, the vector V₁=(1,1) represents the full-pelmotion vector for a given current macroblock.

Moving objects in a video scene do not move in integer pixel incrementsfrom frame to frame. True motion can take any real value along the X andY directions. Consequently, a better match for a current macroblock canoften be found by interpolating the previous frame by a factor N×N andthen searching for the best match in the interpolated frame. The motionvectors can then take values in increments of 1/N pixel along X and Yand are said to have 1/N pixel (or “1/N-pel”) accuracy.

In “Response to Call for Proposals for H.26L,” ITU-TelecommunicationsStandardization Sector, Q.15/SG16, doc. Q15-F-11, Seoul, Nov. 98, and“Enhancement of the Telenor proposal for H.26L,” ITU-TelecommunicationsStandardization Sector, Q.15/SG16, doc. Q15-G-25, Monterey, Feb. 99,Gisle Bjontegaard proposed using ⅓-pel accurate motion vectors andcubic-like interpolation for the H26L video coding standard (the“Telenor encoder”). To do this, the Telenor encoder interpolates or“up-samples” the reference frame by 3×3 using a cubic-like interpolationfilter. This interpolated version requires nine times more memory thanthe reference frame. At a given macroblock, the Telenor encoderestimates the best motion vector in two steps: the encoder firstsearches for the best integer-pel vector and then the Telenor encodersearches for the best ⅓-pixel accurate vector V_(1/3) near V₁. UsingFIG. 1 as an example, a total of eight blocks (of 16×16 pixels) in the3×3 interpolated reference frame are checked to find the best matchwhich, as shown is the block associated to the motion vectorV_(1/3)=(VX, VY)=(1+⅓,1). The Telenor encoder has several problems.First, it uses a sub-optimal fast-search strategy and a complex cubicfilter (at all stages) to compute the ⅓-pel accurate motion vectors. Asa result, the computed motion vectors are not optimal and the memory andcomputation requirements are very expensive. Further, the Telenorencoder uses an accuracy of the effective rate-distortion criteria thatis fixed at ⅓-pixel and, therefore, does not adapt to select bettermotion accuracies. Similarly, the Telenor encoder variable-length code(“VLC”) table has an accuracy fixed at ⅓-pixel and, therefore, is notadapted and interpreted differently for different accuracies.

Most known video compression methods estimate and encode motion vectorswith ½-pixel accuracy, because early studies suggested that higher oradaptive motion accuracies would increase computational complexitywithout providing additional compression gains. These early studies,however, did not estimate the motion vectors using optimizedrate-distortion criteria, did not exploit the convexity properties ofsuch criteria to reduce computational complexity, and did not useeffective strategies to encode the motion vectors and their accuracies.

One such early study was Bernd Girod's “Motion-Compensating Predictionwith Fractional-Pel Accuracy,” IEEE Transactions on Communications, Vol.41, No. 4, pp. 604-612, April 1993 (the “Girod work”). The Girod work isthe first fundamental analysis on the benefits of using sub-pixel motionaccuracy for video coding. Girod used a simple, hierarchical strategy tosearch for the best motion vector in sub-pixel space. He also usedsimple mean absolute difference (“MAD”) criteria to select the bestmotion vector for a given accuracy. The best accuracy was selected usinga formula that is not useful in practice since it is based on idealizedassumptions, is very complex, and restricts all motion vectors to havethe same accuracy within a frame. Finally, Girod focused only onprediction error energy and did not address how to use bits to encodethe motion vectors.

Another early study was Smita Gupta's and Allen Gersho's “On FractionalPixel Motion Estimation,” Proc. SPIE VCIP, Vol. 2094, pp. 408-419,Cambridge, November 1993 (the “Gupta work”). The Gupta work presented amethod for computing, selecting, and encoding motion vectors withsub-pixel accuracy for video compression. The Gupta work disclosed aformula based on mean squared error (“MSE”) and bilinear interpolation,used this formula to find an ideal motion vector, and then quantizedsuch vector to the desired motion accuracy. The best motion vector for agiven accuracy was found using the sub-optimal MSE criteria and the bestaccuracy was selected using the largest decrease in difference energyper distortion bit, which is a greedy (sub-optimal) criteria. A givenmotion vector was coded by first encoding that vector with ½-pelaccuracy and then encoding the higher accuracy with refinement bits.Coarse-to-fine coding tends to require significant bit overhead.

In “On the Optimal Motion Vector Accuracy for Block-BasedMotion-Compensated Video Coders,” Proc. IST/SPIE Digital VideoCompression: Algorithms and Technologies, pp. 302-314, San Jose,February 1996 (the “Ribas work”), Jordi Ribas-Corbera and David L.Neuhoff, modeled the effect of motion accuracy on bit rate and proposedseveral methods to estimate the optimal accuracies that minimize bitrate. The Ribas work set forth a full-search approach for computingmotion vectors for a given accuracy and considered only bilinearinterpolation. The best motion vector was found by minimizing MSE andthe best accuracy was selected using some formulas derived from arate-distortion optimization. The motion vectors and accuracies wereencoded with frame-adaptive entropy coders, which are complex toimplement in real-time applications.

In “Proposal for a new core experiment on prediction enhancement athigher bitrates,” ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures andAudio, MPEG 97/1827, Sevilla, February 1997 and “Performance Evaluationof a Reduced Complexity Implementation for Quarter Pel MotionCompensation,” ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures andAudio, MPEG 97/3146, San Jose, January 1998, Ulrich Benzler proposedusing ¼-pel accurate motion vectors for the video sequence and moreadvanced interpolation filters for the MPEG4 video coding standard.Benzler, however, used the Girod's fast-search technique to find the¼-pel motion vectors. Benzler did consider different interpolationfilters, but proposed a complex filter at the first stage and a simplerfilter at the second stage and interpolated one macroblock at a time.This approach does not require much cache memory, but it iscomputationally expensive because of its complexity and because allmotion vectors are computed with ¼-pel accuracy for all the possiblemodes in a macroblock (e.g., 16×16, four-8×8, sixteen-4×4, etc.) andthen the best mode is determined. Benzler used the MAD criteria to findthe best motion vector which was fixed to ¼-pel accuracy for the wholesequence, and hence he did not address how to select the best motionaccuracy. Finally, Benzler encoded the motion vectors with avariable-length code (“VLC”) table that could be used for encoding ½-and¼-pixel accurate vectors.

The references discussed above do not estimate the motion vectors usingoptimized rate-distortion criteria and do not exploit the convexityproperties of such criteria to reduce computational complexity. Further,these references do not use effective strategies to encode motionvectors and their accuracies.

BRIEF SUMMARY OF THE INVENTION

One preferred embodiment of the present invention addresses the problemsof the prior art by computing motion vectors of high pixel accuracy(also denoted as “fractional” or “sub-pixel” accuracy) with a minorincrease in computation.

Experiments have demonstrated that, by using the search strategy of thepresent invention, a video encoder can achieve significant compressiongains (e.g., up to thirty percent in bit rate savings over the classicalchoices of motion accuracy) using similar levels of computation. Sincethe motion accuracies are adaptively computed and selected, the presentinvention may be described as adaptive motion accuracy (“AMA”).

One preferred embodiment of the present invention uses fast-searchstrategies in sub-pixel space that smartly searches for the best motionvectors. This technique estimates motion vectors in motion-compensatedvideo coding by finding a best motion vector for a macroblock. The firststep is searching a first set of motion vector candidates in a grid ofsub-pixel resolution of a predetermined square radius centered on V₁ tofind a best motion vector V₂. Next, a second set of motion vectorcandidates in a grid of sub-pixel resolution of a predetermined squareradius centered on V₂ is searched to find a best motion vector V₃. Then,a third set of motion vector candidates in a grid of sub-pixelresolution of a predetermined square radius centered on V₃ is searchedto find the best motion vector of the macroblock.

In an alternate preferred embodiment the present invention, a techniquefor estimating high-accurate motion vectors may use differentinterpolation filters at different stages in order to reducecomputational complexity.

Another alternate preferred embodiment of the present invention selectsthe best vectors and accuracies in a rate-distortion (“RD”) sense. Thisembodiment uses rate-distortion criteria that adapts according to thedifferent motion accuracies to determine both the best motion vectorsand the best motion accuracies.

Still further, another alternate preferred embodiment of the presentinvention encodes the motion vector and accuracies with an effective VLCapproach. This technique uses a VLC table that is interpreteddifferently at different coding units, according to the associatedmotion vector accuracy.

The foregoing and other objectives, features, and advantages of theinvention will be more readily understood upon consideration of thefollowing detailed description of the invention, taken in conjunctionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING

FIG. 1 is a diagram of exemplary full-pel and ⅓-pel locations invelocity space.

FIG. 2 is a flowchart illustrating a prior art method for estimating thebest motion vector.

FIG. 3 is a diagram of an exemplary location of motion vector candidatesfor full-search in sub-pixel velocity space.

FIG. 4 is a flowchart illustrating a full-search preferred embodiment ofthe method for estimating the best motion vector of the presentinvention.

FIG. 5 is a diagram of an exemplary location of motion vector candidatesfor fast-search in sub-pixel velocity space.

FIG. 6 is a flowchart illustrating a fast-search preferred embodiment ofthe method for estimating the best motion vector of the presentinvention.

FIG. 7 is a detail flowchart illustrating an alternate preferredembodiment of step 114 of FIG. 6.

FIG. 8 is a graphical representation of experimental performance resultsof the Telenor encoder with and without AMA in the “Container” videosequence, with QCIF resolution, and at the frame rate of 10 frames persecond.

FIG. 9 is a graphical representation of experimental performance resultsof the Telenor encoder with and without AMA in the “News” videosequence, with QCIF resolution, and at the frame rate of 10 frames persecond.

FIG. 10 is a graphical representation of experimental performanceresults of the Telenor encoder with and without AMA in the “Mobile”video sequence, with QCIF resolution, and at the frame rate of 10 framesper second.

FIG. 11 is a graphical representation of experimental performanceresults of the Telenor encoder with and without AMA in the “Garden”video sequence, with SIF resolution, and at the frame rate of 15 framesper second.

FIG. 12 is a graphical representation of experimental performanceresults of the Telenor encoder with and without AMA in the “Garden”video sequence, with QCIF resolution, and at the frame rate of 15 framesper second.

FIG. 13 is a graphical representation of experimental performanceresults of the Telenor encoder with and without AMA in the “Tempete”video sequence, with SIF resolution, and at the frame rate of 15 framesper second.

FIG. 14 is a graphical representation of experimental performanceresults of the Telenor encoder with and without AMA in the “Tempete”video sequence, with QCIF resolution, and at the frame rate of 15 framesper second.

FIG. 15 is a graphical representation of experimental performanceresults of the Telenor encoder with and without AMA in the “Parisshaked” video sequence, with QCIF resolution, and at the frame rate of10 frames per second.

FIG. 16 is a graphical representation of experimental performanceresults of fast-search (“Telenor FSAMA+c”) and full-search (“TelenorAMA+c”) strategies in the “Mobile” video sequence, with QCIF resolution,and at the frame rate of 10 frames per second.

FIG. 17 is a graphical representation of experimental performanceresults of fast-search (“Telenor FSAMA+c”) and full-search (“TelenorAMA+c”) strategies in the “Container” video sequence, with QCIFresolution, and at the frame rate of 10 frames per second.

FIG. 18 is a graphical representation of experimental performanceresults of tests using only one reference frame for motion compensationas compared to tests using multiple reference frames for motioncompensation in the “Mobile” video sequence, with QCIF resolution, andat the frame rate of 10 frames per second.

DETAILED DESCRIPTION OF THE INVENTION

The methods of the present invention are described herein in terms ofthe motion accuracy being modified at each image block. These methods,however, may be applied when the accuracy is fixed for the wholesequence or modified on a frame-by-frame basis. The present invention isalso described as using Telenor's video encoders (and particularly theTelenor encoder) as described in the Background of the Invention.Although described in terms of Telenor's video encoders, the techniquesdescribed herein are applicable to any other motion-compensated videocoder.

Most video coders use motion vectors with half pixel (or “½-pel”)accuracy and bilinear interpolation. The first version of Telenor'sencoder also used ½-pel motion vectors and bilinear interpolation. Thelatest version of Telenor's encoder, however, incorporated ⅓-pel vectorsand cubic-like interpolation because of the additional compressiongains. Specifically, at a given macroblock, Telenor's encoder estimatesthe best motion vector in two steps shown in FIG. 2. First, the Telenorencoder searches for the best integer-pel vector V₁ (FIG. 1) 100.Second, the Telenor encoder searches for the best ⅓-pixel accuratevector V_(1/3) (FIG. 1) near V₁ 102. This second step is showngraphically in FIG. 1 where a total of eight blocks (each having anarray of 16×16 pixels) in the 3×3 interpolated reference frame arechecked to find the best match. The motion vectors for these eightblocks are represented by the eight solid dots in the grid centered onV₁. In FIG. 1 the best match is the block associated to the motionvector V_(1/3)=(V_(x), V_(y))=(1+⅓, 1).

The technology of the present invention allows the encoder to choosebetween any set of motion accuracies (for example, ½, ⅓, and ⅙-pelaccurate motion vectors) using either a full search strategy or a fastsearch strategy.

Full-Search AMA Search Strategy

As shown in FIGS. 3 and 4, in the full-search adaptive motion accuracy(“AMA”) search strategy the encoder searches all the motion vectorcandidates in a grid of ⅙-pixel resolution and a “square radius”(defined herein as a square block defined by a number of pixels up, anumber of pixels down, and a number of pixels to both sides) of fivepixels as shown in FIG. 3. FIG. 4 shows that the first step of thefull-search AMA is to search for the best integer-pel vector V₁ (FIG. 1)104. In the second step of the full-search AMA, the encoder searches forthe best ⅙-pixel accurate vector V_(1/6) (FIG. 3) near V₁ 106. In otherwords, the full-search AMA modifies the second step of the Telenor'sprocess so that the encoder also searches for motion vector candidatesin other sub-pixel locations in the velocity space. The objective is tofind the best motion vector in the grid, i.e., the vector that points tothe block (in the interpolated reference frame) that best matches thecurrent macroblock. Although the full-search strategy is computationallycomplex since it searches 120 sub-pixel candidates, it shows the fullpotential of this preferred method of the present invention.

A critical issue in the motion vector search is the choice of a measureor criterion for establishing which block is the best match for thegiven macroblock. In practice, most methods use either the mean squarederror (“MSE”) or mean absolute difference (“MAD”) criteria. The MSEbetween two blocks consists of subtracting the pixel values of the twoblocks, squaring the pixel differences, and then taking the average. TheMAD difference between two blocks is a similar distortion measure,except that the absolute value of the pixel differences is computedinstead of the squares. If two image blocks are similar to each other,the MSE and MAD values will be small. If, however, the image blocks aredissimilar, these values will be large. Hence, typical video coders findthe best match for a macroblock by selecting the motion vector thatproduces either the smallest MSE or the smallest MAD. In other words,the block associated to the best motion vector is the one closest to thegiven macroblock in an MSE or MAD sense.

Unfortunately, the MSE and MAD distortion measures do not take intoaccount the cost in bits of actually encoding the vector. For example, agiven motion vector may minimize the MSE, but it may be very costly toencode with bits, so it may not be the best choice from a codingstandpoint.

To deal with this, advanced encoders such as those described by Telenoruse rate-distortion (“RD”) criteria of the type “distortion+L*Bits” toselect the best motion vector. The value of “distortion” is typicallythe MSE or MAD, “L” is a constant that depends on the compression level(i.e., the quantization step size), and “Bits” is the number of bitsrequired to code the motion vector. In general, any RD criteria of thistype would work with the present invention. However, in the presentinvention “Bits” include the bits needed for encoding the vector andthose for encoding the accuracy of the vector. In fact, some candidatescan have several “Bits” values, because they can have several accuracymodes. For example, the candidate at location (½, −½) can be thought ofhaving ½ or ⅙-pixel accuracy.

Fast-Search AMA Search Strategy

As shown in FIGS. 5 and 6, in the fast-search adaptive motion accuracy(“AMA”) search strategy the encoder checks only a small set of themotion vector candidates. In the first step of the fast-search AMA, theencoder checks the eight motion vector candidates in a grid of ½-pixelresolution of square radius 1, which is centered on V₁ 108. V₂ is thenset to denote the candidate that has the smallest RD cost (i.e., thebest of the eight previous vectors and V₁) 110. Next, the encoder checksthe eight motion vector locations in a grid of ⅙-pixel resolution ofsquare radius 1 that is now centered on V₂ 112. If V₂ has the smallestRD cost 114, the encoder stops its search and selects V₂ as the motionvector for the block. Otherwise, V₃ is set to denote the best motionvector of the eight 116. The encoder then searches for a new motionvector candidate in the grid of ⅙-pixel resolution of square radius 1that is centered on V₃ 118. It should be noted that some of thecandidates in this grid have already been tested and can be skipped. Thecandidate with the smallest RD cost in this last step is selected as themotion vector for the block 120.

Experimental data has shown that, on average, this simple fast searchstrategy typically checks the RD cost of about eighteen locations insub-pixel space (ten more than Telenor's search strategy), and hence theoverall computational complexity is only moderately increased.

The experimental data discussed below in connection with FIGS. 8-18 showthat there is practically no loss in compression performance from usingthis fast-search version of AMA. This is because the fast-search AMAsearch strategy exploits the convexity of the “distortion+L*Bits” curve(c.f., “distortion” is known to be convex), by creating a path thatsmartly follows the RD cost from higher to lower levels.

Alternate embodiments of the invention replace one or more of the steps108-120. These embodiments have also been effective and have furtherreduced the number of motion vector candidates to check in the sub-pixelvelocity space.

FIG. 7, for example, checks candidates of ⅓-pel accuracy. In thisembodiment step 112 is replaced by one of three possible scenarios.First, if the best motion vector candidate from step 110 is at thecenter of V₁ (the “integer-pel vector”) 130, then the encoder checksthree candidates of ⅓-pel accuracy between the center vector and the½-pel location with the next lowest RD cost 132. Second, if the bestmotion vector candidate from step 110 is a corner vector 134, then, theencoder checks the four vector candidates of ⅓-pel accuracy that areclosest to such corner 136. Third, if the best motion vector candidatefrom step 110 is between two corners 138, then, the encoder determineswhich of these two corners has lower RD cost and checks the four vectorcandidates of ⅓-pel accuracy that are closest to the line between suchcorner and the best candidate from step 110 140. It should be noted thatin implementing this process step 138 may be unnecessary because if V₂is neither at the center or a corner vector, then it would necessarilybe between two corners. If the encoder is set to find motion vectorswith ⅓-pixel accuracy, FIG. 7 could be modified to end rather thancontinuing with step 114.

Computation And Memory Savings

Because step 108 checks only motion vector candidates of ½-pixelaccuracy, the computation and memory requirements for the hardware orsoftware implementation are significantly reduced. To be specific, in asmart implementation embodiment of this fast-search the reference frameis interpolated by 2×2 in order to obtain the RD costs for the ½-pelvector candidates. A significant amount of fast (or cache) memory for ahardware or software encoder is saved as compared to Telenor's approachthat needed to interpolate the reference frame by 3×3. In comparison tothe Telenor encoder, this is a cache memory savings of 9/4, or a factorof 2.25. The few additional interpolations can be done later on ablock-by-block basis.

Additionally, since the interpolations in step 108 are used to directthe search towards the lower values of the RD cost function, a complexfilter is not needed for these interpolations. Accordingly, computationpower may be saved by using a simple bilinear filter for step 108.

Also, other key coding decisions such as selecting the mode of amacroblock (e.g., 16×16, four-8×8, etc.) can be done using the ½-pelvectors because such decisions do not benefit significantly from usinghigher accuracies. Then, the encoder can use a more complex cubic filterto interpolate the required sub-pixel values for the few additionalvector candidates to check in the remaining steps. Since the macroblockmode has already been chosen, these final interpolations only need to bedone for the chosen mode.

Use of multiple-filters obtained computation savings of over twentypercent in running time on a Sparc Ultra 10 Workstation in comparison toTelenor's approach, which uses a cubic interpolation all the time.Additionally, the fast-memory requirements were reduced by nearly half.Also, there was little or no loss in compression performance. Comparingone preferred embodiment of the fast-search, Benzler's techniquerequires about 70 interpolations per pixel in the Telenor encoder andthe present invention requires only about 7 interpolations per pixel.

Coding The Motion Vector And Accuracies With Bits

Once the best motion vector and accuracy are determined, the encoderencodes both the motion vector and accuracy values with bits. Oneapproach is to encode the motion vector with a given accuracy (e.g.,half-pixel accuracy) and then add some extra bits for refining thevector to the higher motion accuracy. This is the strategy suggested byB. Girod, but it is sub-optimal in a rate-distortion sense.

In one preferred embodiment of the present invention, the accuracy ofthe motion vector for a macroblock is first encoded using a simple codesuch as the one given in Table 1. Any other table with code lengths {1,2, 2} could be used as well. The bit rate could be further reduced usinga typical DPCM approach.

TABLE 1 VLC table to indicate the accuracy mode for a given macroblock.Code Motion Accuracy 1 ½-pel 01 ⅓-pel 11 ⅙-pelNext, the value of the vector/s in the respective accuracy space isencoded. These bits can be obtained from entries of a single VLC tablesuch as the one used in the H26L codec. The key idea is that these bitsare interpreted differently depending on the motion accuracy for themacroblock. For example, if the motion accuracy is ⅓ and the code bitsfor the X component of the difference motion vector are 00001¹, the Xcomponent of the vector is Vx=⅔. If the accuracy is ½, such codecorresponds to Vx=1. ¹Observe that this code is the fourth entry (codenumber 3) of H26L's VLC table in [6].

Compared to the Benzler method for encoding the motion vectors with avariable length code (“VLC”) table that could be used for encoding ½and¼pixel accurate vectors, the method of the present invention can be usedfor encoding vectors of any motion accuracy and the table can beinterpreted differently at each frame and macroblock. Further, thegeneral method of the present invention can be used for any motionaccuracy, not necessarily those that are multiples of each other orthose that are of the type 1/n (with n an integer). The number ofincrements in the given sub-pixel space is simply counted and the bitsin the associated entry of the table is used as the code.

From the decoder's viewpoint, once the motion accuracy is decoded, themotion vector can also be easily decoded. After that, the associatedblock in the previous frame is reconstructed using a typical 4-tap cubicinterpolator. There is a different 4-tap filter for each motionaccuracy.

The AMA does not increase decoding complexity, because the number ofoperations needed to reconstruct the predicted block are the same,regardless of the motion accuracy.

Experimental Results

FIGS. 8-18 show test results of the Telenor encoder codec with andwithout AMA in a variety of video sequences, resolutions, and framerates, as described in Table 2. These figures show rate-distortion(“RD”) plots for each case. The “Anchor” curve shows RD points fromoptimized H.263+ (FIGS. 8 and 9 only). The “Telenor ½+b” curve showsTelenor with ½-pel vectors and bilinear interpolation (the “classicalcase”). The “Telenor ⅓” curve shows the current Telenor proposal (the“Telenor encoder”). The “Telenor+AMA+c” curve shows the Telenor encoderwith the full-search strategy of the present invention. The“Telenor+FSAMA+c”, as shown in FIGS. 15-17, shows the current Telenorencoder with the fast-search strategy. (Unless otherwise specified, thefull-search version of AMA was the encoder strategy used in theexperiments.) All of the test results were cross-checked at the encoderand decoder. These results show that with AMA the gains in peaksignal-to-noise ratio (“PSNR”) can be as high as 1 dB over H26L, andeven higher over the classical case.

TABLE 2 Description of the Experiments Video sequence FIG # ResolutionFrame rate Container FIG. 8 QCIF 10 News FIG. 9 QCIF 10 Mobile FIG. 10QCIF 10 FIG. 11 SIF 15 Garden FIG. 12 QCIF 15 Tempete FIG. 13 SIF 15FIG. 14 QCIF 15 Paris Shaked FIG. 15 QCIF 10

The video sequences are commonly used by the video coding community,except for “Paris Shaked.” The latter is a synthetic sequence obtainedby shifting the well-known sequence “Paris” by a motion vector whose Xand Y components take a random value within [−1,1]. This syntheticsequence simulates small movements caused by a hand-held camera in atypical video phone scene.

Comparison Of Full-Search And Fast-Search AMA

The experimental results shown in FIGS. 16 and 17 demonstrate that theencoder performance with fast-search (“Telenor FSAMA+c”) and full-search(“Telenor AMA+c”) strategies for AMA is practically the same. This istrue because the fast-search strategies exploit the convexity of the RDcost curve in the sub-pixel velocity space. In other words, since theshape of the RD cost follows a smooth convex curve, its minimum shouldbe easy to find with some smart fast-search schemes that descend downthe curve.

Combining AMA And Multiple Reference Frames

In the plot shown in FIG. 18, the curves labeled “1r” used only onereference frame for the motion compensation, so these curves are thesame as those presented in FIG. 10. The curves labeled “5r” used fivereference frames.

The experiments show that the gains with AMA add to those obtained usingmultiple reference frames. The gain from AMA in the one-reference casecan be measured by comparing the curve labeled with a “+” (TelenorAMA+c+1r) with the curve labeled with an “x” (Telenor ⅓+1r), and thegain in the five-reference case can be measured between the curvelabeled with a “diamond” (Telenor AMA+c+5r) with the curve labeled witha “*” (Telenor ⅓+5r).

It should be noted that the present invention may be implemented at theframe level so that different frames could use different motionaccuracies, but within a frame all motion vectors would use the sameaccuracy. Preferably in this embodiment the motion vector accuracy wouldthen be signaled only once at the frame layer. Experiments have shownthat using the best, fixed motion accuracy for the whole frame shouldalso produce compression gains as those presented here for themacroblock-adaptive case.

In another frame-based embodiment the encoder could do motioncompensation on the entire frame with the different vector accuraciesand then select the best accuracy according to the RD criteria. Thisapproach is not suitable for pipeline, one-pass encoders, but it couldbe appropriate for software-based or more complex encoders. In stillanother frame-based embodiment, the encoder could use previousstatistics and/or formulas to predict what will be the best accuracy fora given frame (e.g., the formulas set forth in the Ribas work or avariation thereof can be used). This approach would be well-suited forone-pass encoders, although the performance gains would depend on theprecision of the formulas used for the prediction.

The terms and expressions which have been employed in the foregoingspecification are used therein as terms of description and not oflimitation, and there is no intention, in the use of such terms andexpressions, of excluding equivalents of the features shown anddescribed or portions thereof, it being recognized that the scope of theinvention is defined and limited only by the claims that follow.

What is claimed is:
 1. A fast-search adaptive motion accuracy searchmethod for estimating motion vectors in motion-compensated video codingby finding a best motion vector for a macroblock, said method comprisingthe steps of: (a) searching a first set of motion vector candidates in agrid of sub-pixel resolution of a predetermined square radius centeredon V₁ to find a best motion vector V₂ using a first criteria; (b)searching a second set of motion vector candidates in a grid ofsub-pixel resolution of a predetermined square radius centered on V₂ tofind a best motion vector V₃ using a second criteria; (c) searching athird set of motion vector candidates in a grid of sub-pixel resolutionof a predetermined square radius centered on V₃ to find said best motionvector of said macroblock using a third criteria; and (d) wherein atleast one of said first criteria, said second criteria, and said thirdcriteria is a rate-distortion criteria.
 2. The method of claim 1, saidstep of searching a first set of motion vector candidates in a grid ofsub-pixel resolution of a predetermined square radius centered on V₁ tofind a best motion vector V₂ further comprising the step of searching afirst set of eight motion vector candidates in a grid of ½-pixelresolution of square radius 1 centered on V₁ to find a best motionvector V₂.
 3. The method of claim 1, said step of searching a second setof motion vector candidates in a grid of sub-pixel resolution of apredetermined square radius centered on V₂ to find a best motion vectorV₃ further comprising the step of searching a second set of eight motionvector candidates in a grid of ⅙-pixel resolution of square radius 1centered on V₂ to find a best motion vector V₃.
 4. The method of claim 1further comprising the steps of using V₂ as the motion vector for themacroblock if V₂ has the smallest rate-distortion cost and skipping step(c) of claim
 1. 5. The method of claim 1, said step of searching a thirdset of motion vector candidates in a grid of sub-pixel resolution of apredetermined square radius centered on V₃ to find said best motionvector of said macroblock further comprising the step of searching athird set of eight motion vector candidates in a grid of ⅙-pixelresolution of square radius 1 centered on V₃ to find said best motionvector of said macroblock.
 6. The method of claim 1, said step ofsearching a third set of motion vector candidates in a grid of sub-pixelresolution of a predetermined square radius centered on V₃ to find saidbest motion vector of said macroblock further comprising the step ofskipping motion vector candidates of said third set of motion vectorcandidates that have already been tested.
 7. The method of claim 1further wherein said step of searching said first set of motion vectorcandidates further comprises the step of searching said first set ofmotion vector candidates using a first filter to do a firstinterpolation, said step of searching said second set of motion vectorcandidates further comprises the step of searching said second set ofmotion vector candidates using a second filter to do a secondinterpolation, and said step of searching said third set of motionvector candidates further comprises the step of searching said third setof motion vector candidates using a third filter to do a thirdinterpolation.
 8. The method of claim 1, said step of searching a secondset of motion vector candidates in a grid of sub-pixel resolution of apredetermined square radius centered on V₂ to find a best motion vectorV₃ further comprising the steps of: (a) searching three candidates of⅓-pel accuracy V₂ and a ½-pel location with the next lowestrate-distortion cost if V₂ is at the center; (b) searching four vectorcandidates of ⅓-pel accuracy that are closest to V₂ if V₂ is a cornervector; and (c) determining which of two corners has lowerrate-distortion cost and searching four vector candidates of ⅓-pelaccuracy that are closest to a line between said corner with lowerrate-distortion cost, if V₂ is between two corners vectors.
 9. Anadaptive motion accuracy search method for estimating motion vectors inmotion-compensated video coding by finding a best motion vector for amacroblock, said method comprising the steps of: (a) searching a firstset of motion vector candidates in a grid centered on V₁ using a firstcriteria to find a best motion vector V₂ using a first filter to do afirst interpolation; (b) searching a second set of motion vectorcandidates in a grid centered on V₂ using a second criteria to find abest motion vector V₃ using a second filter to do a secondinterpolation; and (c) searching a third set of motion vector candidatesin a grid centered on V₃ using a third criteria to find said best motionvector of said macroblock using a third filter to do a thirdinterpolation; (d) wherein at least one of said first criteria, saidsecond criteria, and said third criteria is a rate-distortion criteria.10. The method of claim 9 wherein said step of searching using a firstfilter to do a first interpolation further comprises using a simplefilter to do a coarse interpolation.
 11. The method of claim 9 whereinsaid step of searching using a first filter to do a first interpolationfurther comprises using a simple filter to do a coarse interpolation andsaid step of searching using a second filter to do a secondinterpolation further comprises using a complex filter to do a fineinterpolation.
 12. The method of claim 11 wherein said step of searchingusing a third filter to do a third interpolation further comprises usinga complex filter to do a fine interpolation.
 13. The method of claim 9wherein said step of searching using a first filter to do a firstinterpolation further comprises using a bilinear filter to interpolatethe reference frame by 2×2.
 14. The method of claim 9 wherein said stepof searching using a first filter to do a first interpolation furthercomprises using a bilinear filter to interpolate the reference frame by2×2 and said step of searching using a second filter to do a secondinterpolation further comprises using a cubic filter to do a fineinterpolation.
 15. The method of claim 14 wherein said step of searchingusing a third filter to do a third interpolation further comprises usinga cubic filter to do a fine interpolation.
 16. An adaptive motionaccuracy search method for estimating motion vectors inmotion-compensated video coding by finding a best motion vector for amacroblock, said method comprising the steps of: (a) searching at afirst motion accuracy for a first best motion vector of said macroblock;(b) encoding said first best motion vector and said first motionaccuracy; (c) searching for at least one second best motion vector ofsaid macroblock at an at least one second motion accuracy; (d) encodingsaid at least one second best motion vector and said at least one secondmotion accuracy; and (e) selecting the best motion vector of said firstand at least one second best motion vectors using rate-distortioncriteria.
 17. The method of claim 16 wherein said step of selecting thebest motion vector using rate-distortion criteria further comprises thestep of said rate-distortion criteria adapting according to thedifferent motion accuracies to determine both the best motion vectorsand the best motion accuracies.
 18. The method of claim 16, said step ofsearching for at least one second best motion vector at an at least onesecond motion accuracy further comprising the step of searching for atleast one second best motion vector of said macroblock at an at leastone second motion accuracy that is finer than said first motionaccuracy.
 19. The method of claim 16 wherein said step of selecting thebest motion vector using rate-distortion criteria further comprises thestep of using rate-distortion criteria of the type “distortion+L*Bits”to select the best motion vector.
 20. An adaptive motion accuracy searchmethod for estimating motion vectors in motion-compensated video codingby finding a best motion vector for a macroblock, said method comprisingthe steps of: (a) searching at a motion accuracy for a best motionvector of said macroblock using rate-distortion criteria; (b) encodingsaid motion accuracy using a code from a VLC table that is interpreteddifferently at different coding units according to the associated motionvector accuracy; and (c) encoding said best motion vector in therespective accuracy space.
 21. A system for estimating motion vectors inmotion-compensated video coding by finding a best motion vector for amacroblock, said system comprising: (a) a first encoder for searching afirst set of motion vector candidates in a grid of sub-pixel resolutionof a predetermined square radius centered on V₁ using a first criteriato find a best motion vector V₂; (b) a second encoder for searching asecond set of motion vector candidates in a grid of sub-pixel resolutionof a predetermined square radius centered on V₂ using a second criteriato find a best motion vector V₃; and (c) a third encoder for searching athird set of motion vector candidates in a grid of sub-pixel resolutionof a predetermined square radius centered on V₃ using a third criteriato find said best motion vector of said macroblock; (d) wherein at leastone of said first criteria, said second criteria, and said thirdcriteria is a rate-distortion criteria.
 22. The system of claim 21wherein said first, second, and third encoders are a single encoder. 23.A fast-search adaptive motion accuracy search method for estimatingmotion vectors in motion-compensated video coding by finding a bestmotion vector for a macroblock, said method comprising the steps of: (a)searching a first set of motion vector candidates in a grid of sub-pixelresolution of a predetermined square radius centered on V₁ to find abest motion vector V₂; (b) searching a second set of motion vectorcandidates in a grid of sub-pixel resolution of a predetermined squareradius centered on V₂ to find a best motion vector V₃; (c) searching athird set of motion vector candidates in a grid of sub-pixel resolutionof a predetermined square radius centered on V₃ to find said best motionvector of said macroblock, and (d) using V₂ as the motion vector for themacroblock if V₂ has the smallest rate-distortion cost and skipping step(c).
 24. The method of claim 1, wherein said first criteria, said secondcriteria, and said third criteria are all rate-distortion criteria. 25.The method of claim 9, wherein said first criteria, said secondcriteria, and said third criteria are all rate-distortion criteria. 26.The system of claim 21, wherein said first criteria, said secondcriteria, and said third criteria are all rate-distortion criteria. 27.A method of performing, at a video encoder, motion-compensated videoencoding of an input image having respective frames each of which isdecomposed into blocks, comprising: searching, block by block, for amotion vector that represents an amount of movement from a correspondingposition of a reference frame in an objective current block for each ofthe blocks decomposed from a frame of the input image, compensating amotion using the motion vector, the motion vector having fractionalaccuracy expressed by 1/N pel (where N is an integer ≧2), encoding theaccuracy of the motion vector and the motion vector, wherein encodingthe accuracy is performed separately from encoding the motion vector,said encoding the motion vector step encoding the motion vector for eachblock on a block-by-block basis, said searching and said compensatingsteps are performed by selecting a filter from a plurality ofinterpolation filters according to the accuracy, wherein the accuracy isset on a frame basis such that the same accuracy is used for everymotion vector in a frame but a different accuracy can be used for adifferent frame.
 28. A method of decoding, at a video decoder,motion-compensated coded data obtained by encoding an image frameblock-by-block, comprising: decoding a motion vector that represents anamount of movement from a corresponding position of a reference frame inan objective current block for each of the blocks included in the codeddata obtained, decoding an accuracy of the motion vector, the motionvector having fractional accuracy expressed by 1/N pel (where N is aninteger ≧2), compensating a motion using the accuracy and the motionvector, said decoding the accuracy step being performed separately fromsaid decoding the motion vector step, said decoding the motion vectorstep being performed on a block-by-block basis, said motion compensationstep being performed by selecting a filter from a plurality ofinterpolation filters according to the accuracy, wherein the accuracy isset on a frame basis such that the same accuracy is used for everymotion vector in a frame but a different accuracy can be used for adifferent frame.